Overview

Dataset statistics

Number of variables19
Number of observations10027
Missing cells2577
Missing cells (%)1.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.5 MiB
Average record size in memory152.0 B

Variable types

NUM17
CAT2

Warnings

ID has a high cardinality: 3123 distinct values High cardinality
householdID has a high cardinality: 6675 distinct values High cardinality
wbc has 230 (2.3%) missing values Missing
mcv has 225 (2.2%) missing values Missing
plt has 227 (2.3%) missing values Missing
bun has 165 (1.6%) missing values Missing
glu has 186 (1.9%) missing values Missing
crea has 189 (1.9%) missing values Missing
cho has 172 (1.7%) missing values Missing
tg has 171 (1.7%) missing values Missing
hdl has 164 (1.6%) missing values Missing
ldl has 182 (1.8%) missing values Missing
crp has 163 (1.6%) missing values Missing
hbalc has 104 (1.0%) missing values Missing
ua has 163 (1.6%) missing values Missing
hgb has 228 (2.3%) missing values Missing
householdID is uniformly distributed Uniform
df_index has unique values Unique

Reproduction

Analysis started2020-10-19 07:44:44.171006
Analysis finished2020-10-19 07:45:29.044177
Duration44.87 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct10027
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5071.247432
Minimum0
Maximum10136
Zeros1
Zeros (%)< 0.1%
Memory size78.3 KiB

Quantile statistics

Minimum0
5-th percentile508.3
Q12528.5
median5075
Q37612.5
95-th percentile9630.7
Maximum10136
Range10136
Interquartile range (IQR)5084

Descriptive statistics

Standard deviation2930.336323
Coefficient of variation (CV)0.577833435
Kurtosis-1.204010245
Mean5071.247432
Median Absolute Deviation (MAD)2542
Skewness-0.001167268458
Sum50849398
Variance8586870.968
MonotocityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
20471< 0.1%
 
13461< 0.1%
 
95501< 0.1%
 
34031< 0.1%
 
13541< 0.1%
 
74971< 0.1%
 
54481< 0.1%
 
95421< 0.1%
 
33951< 0.1%
 
74891< 0.1%
 
Other values (10017)1001799.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
101361< 0.1%
 
101351< 0.1%
 
101341< 0.1%
 
101331< 0.1%
 
101321< 0.1%
 

ID
Categorical

HIGH CARDINALITY

Distinct3123
Distinct (%)31.1%
Missing0
Missing (%)0.0%
Memory size78.3 KiB
3.26E+11
 
264
2.66E+11
 
239
2.96E+11
 
230
3.30E+11
 
193
1.08E+11
 
188
Other values (3118)
8913 
ValueCountFrequency (%) 
3.26E+112642.6%
 
2.66E+112392.4%
 
2.96E+112302.3%
 
3.30E+111931.9%
 
1.08E+111881.9%
 
2.45E+111831.8%
 
3.25E+111731.7%
 
2.07E+111551.5%
 
2.11E+111451.4%
 
1.05E+111451.4%
 
Other values (3113)811280.9%
 
Frequencies of value counts

Unique

Unique3041 ?
Unique (%)30.3%
Histogram of lengths of the category

Length

Max length11
Median length8
Mean length8.909843423
Min length8

householdID
Categorical

HIGH CARDINALITY
UNIFORM

Distinct6675
Distinct (%)66.6%
Missing0
Missing (%)0.0%
Memory size78.3 KiB
2982453020
 
2
101043010
 
2
1082593060
 
2
1682311150
 
2
3211312170
 
2
Other values (6670)
10017 
ValueCountFrequency (%) 
29824530202< 0.1%
 
1010430102< 0.1%
 
10825930602< 0.1%
 
16823111502< 0.1%
 
32113121702< 0.1%
 
29743931902< 0.1%
 
26605910602< 0.1%
 
1010422202< 0.1%
 
26605910902< 0.1%
 
32404511402< 0.1%
 
Other values (6665)1000799.8%
 
Frequencies of value counts

Unique

Unique3323 ?
Unique (%)33.1%
Histogram of lengths of the category

Length

Max length10
Median length10
Mean length9.696718859
Min length9

wbc
Real number (ℝ≥0)

MISSING

Distinct574
Distinct (%)5.9%
Missing230
Missing (%)2.3%
Infinite0
Infinite (%)0.0%
Mean6.24178422
Minimum3.1
Maximum12.2
Zeros0
Zeros (%)0.0%
Memory size78.3 KiB

Quantile statistics

Minimum3.1
5-th percentile3.8
Q14.99
median6
Q37.2
95-th percentile9.7
Maximum12.2
Range9.1
Interquartile range (IQR)2.21

Descriptive statistics

Standard deviation1.813126551
Coefficient of variation (CV)0.2904820941
Kurtosis0.7906836293
Mean6.24178422
Median Absolute Deviation (MAD)1.1
Skewness0.8676625076
Sum61150.76
Variance3.28742789
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
5.62402.4%
 
5.32262.3%
 
5.52182.2%
 
6.22152.1%
 
5.12152.1%
 
5.92132.1%
 
5.42132.1%
 
6.32102.1%
 
52072.1%
 
5.82062.1%
 
Other values (564)763476.1%
 
(Missing)2302.3%
 
ValueCountFrequency (%) 
3.11211.2%
 
3.121< 0.1%
 
3.132< 0.1%
 
3.141< 0.1%
 
3.151< 0.1%
 
ValueCountFrequency (%) 
12.21051.0%
 
12.192< 0.1%
 
12.14< 0.1%
 
12100.1%
 
11.94< 0.1%
 

mcv
Real number (ℝ≥0)

MISSING

Distinct597
Distinct (%)6.1%
Missing225
Missing (%)2.2%
Infinite0
Infinite (%)0.0%
Mean90.66989186
Minimum64.4
Maximum109.8
Zeros0
Zeros (%)0.0%
Memory size78.3 KiB

Quantile statistics

Minimum64.4
5-th percentile73.7
Q186.9
median91.4
Q395.7
95-th percentile102.8
Maximum109.8
Range45.4
Interquartile range (IQR)8.8

Descriptive statistics

Standard deviation8.21103214
Coefficient of variation (CV)0.0905596331
Kurtosis1.188105672
Mean90.66989186
Median Absolute Deviation (MAD)4.4
Skewness-0.7072605261
Sum888746.28
Variance67.4210488
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
911361.4%
 
901351.3%
 
891321.3%
 
941181.2%
 
881161.2%
 
931161.2%
 
951121.1%
 
921091.1%
 
109.81011.0%
 
64.61011.0%
 
Other values (587)862686.0%
 
(Missing)2252.2%
 
ValueCountFrequency (%) 
64.41< 0.1%
 
64.52< 0.1%
 
64.61011.0%
 
64.81< 0.1%
 
64.92< 0.1%
 
ValueCountFrequency (%) 
109.81011.0%
 
109.74< 0.1%
 
109.62< 0.1%
 
109.521< 0.1%
 
109.52< 0.1%
 

plt
Real number (ℝ≥0)

MISSING

Distinct342
Distinct (%)3.5%
Missing227
Missing (%)2.3%
Infinite0
Infinite (%)0.0%
Mean211.9114286
Minimum71
Maximum417
Zeros0
Zeros (%)0.0%
Memory size78.3 KiB

Quantile statistics

Minimum71
5-th percentile104
Q1163
median208
Q3255
95-th percentile333
Maximum417
Range346
Interquartile range (IQR)92

Descriptive statistics

Standard deviation69.59708597
Coefficient of variation (CV)0.3284253541
Kurtosis0.1172440802
Mean211.9114286
Median Absolute Deviation (MAD)46
Skewness0.437624346
Sum2076732
Variance4843.754375
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
4171041.0%
 
711031.0%
 
208710.7%
 
190700.7%
 
180690.7%
 
195670.7%
 
186670.7%
 
229670.7%
 
241670.7%
 
200650.6%
 
Other values (332)905090.3%
 
(Missing)2272.3%
 
ValueCountFrequency (%) 
711031.0%
 
722< 0.1%
 
734< 0.1%
 
745< 0.1%
 
7570.1%
 
ValueCountFrequency (%) 
4171041.0%
 
4163< 0.1%
 
4151< 0.1%
 
4145< 0.1%
 
4132< 0.1%
 

bun
Real number (ℝ≥0)

MISSING

Distinct744
Distinct (%)7.5%
Missing165
Missing (%)1.6%
Infinite0
Infinite (%)0.0%
Mean15.67527021
Minimum7.67474
Maximum29.46652
Zeros0
Zeros (%)0.0%
Memory size78.3 KiB

Quantile statistics

Minimum7.67474
5-th percentile9.5234
Q112.54848
median15.1254
Q318.2065
95-th percentile23.64044
Maximum29.46652
Range21.79178
Interquartile range (IQR)5.65802

Descriptive statistics

Standard deviation4.323834515
Coefficient of variation (CV)0.2758379573
Kurtosis0.5093365861
Mean15.67527021
Median Absolute Deviation (MAD)2.77299
Skewness0.7227712617
Sum154589.5148
Variance18.69554492
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
7.674741001.0%
 
29.46652860.9%
 
15.88167440.4%
 
14.31311410.4%
 
13.52883400.4%
 
14.95734390.4%
 
16.07774380.4%
 
12.96863380.4%
 
15.15341370.4%
 
15.99371370.4%
 
Other values (734)936293.4%
 
(Missing)1651.6%
 
ValueCountFrequency (%) 
7.674741001.0%
 
7.730763< 0.1%
 
7.758771< 0.1%
 
7.786781< 0.1%
 
7.814794< 0.1%
 
ValueCountFrequency (%) 
29.46652860.9%
 
29.382491< 0.1%
 
29.3544870.1%
 
29.298461< 0.1%
 
29.270451< 0.1%
 

glu
Real number (ℝ≥0)

MISSING

Distinct801
Distinct (%)8.1%
Missing186
Missing (%)1.9%
Infinite0
Infinite (%)0.0%
Mean109.6849507
Minimum63.9
Maximum276.12
Zeros0
Zeros (%)0.0%
Memory size78.3 KiB

Quantile statistics

Minimum63.9
5-th percentile82.08
Q194.5
median102.42
Q3113.76
95-th percentile165.24
Maximum276.12
Range212.22
Interquartile range (IQR)19.26

Descriptive statistics

Standard deviation31.02302867
Coefficient of variation (CV)0.2828376041
Kurtosis11.86091288
Mean109.6849507
Median Absolute Deviation (MAD)9.18
Skewness3.063208251
Sum1079409.6
Variance962.4283079
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
276.121001.0%
 
63.9890.9%
 
98.28710.7%
 
96.48680.7%
 
100.8680.7%
 
101.88660.7%
 
99.72640.6%
 
101.7640.6%
 
102.78630.6%
 
94.5630.6%
 
Other values (791)912591.0%
 
(Missing)1861.9%
 
ValueCountFrequency (%) 
63.9890.9%
 
64.443< 0.1%
 
64.623< 0.1%
 
65.162< 0.1%
 
65.72< 0.1%
 
ValueCountFrequency (%) 
276.121001.0%
 
273.422< 0.1%
 
272.71< 0.1%
 
271.981< 0.1%
 
271.263< 0.1%
 

crea
Real number (ℝ≥0)

MISSING

Distinct84
Distinct (%)0.9%
Missing189
Missing (%)1.9%
Infinite0
Infinite (%)0.0%
Mean0.7773051535
Minimum0.4294
Maximum1.3673
Zeros0
Zeros (%)0.0%
Memory size78.3 KiB

Quantile statistics

Minimum0.4294
5-th percentile0.5311
Q10.6441
median0.7571
Q30.8814
95-th percentile1.1074
Maximum1.3673
Range0.9379
Interquartile range (IQR)0.2373

Descriptive statistics

Standard deviation0.1787702666
Coefficient of variation (CV)0.2299872397
Kurtosis0.6906530957
Mean0.7773051535
Median Absolute Deviation (MAD)0.113
Skewness0.7354648268
Sum7647.1281
Variance0.03195880824
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0.72322952.9%
 
0.70062822.8%
 
0.68932812.8%
 
0.75712772.8%
 
0.71192762.8%
 
0.64412702.7%
 
0.6782652.6%
 
0.76842572.6%
 
0.77972552.5%
 
0.66672542.5%
 
Other values (74)712671.1%
 
ValueCountFrequency (%) 
0.42941191.2%
 
0.4407190.2%
 
0.452250.2%
 
0.4633260.3%
 
0.4746370.4%
 
ValueCountFrequency (%) 
1.36731051.0%
 
1.3565< 0.1%
 
1.344780.1%
 
1.33345< 0.1%
 
1.32215< 0.1%
 

cho
Real number (ℝ≥0)

MISSING

Distinct473
Distinct (%)4.8%
Missing172
Missing (%)1.7%
Infinite0
Infinite (%)0.0%
Mean193.2201301
Minimum115.98
Maximum300.7748
Zeros0
Zeros (%)0.0%
Memory size78.3 KiB

Quantile statistics

Minimum115.98
5-th percentile137.243
Q1167.0112
median190.2072
Q3215.3362
95-th percentile261.7282
Maximum300.7748
Range184.7948
Interquartile range (IQR)48.325

Descriptive statistics

Standard deviation37.33860855
Coefficient of variation (CV)0.1932438847
Kurtosis0.0915980175
Mean193.2201301
Median Absolute Deviation (MAD)24.3558
Skewness0.4758168
Sum1904184.382
Variance1394.171689
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
300.77481031.0%
 
115.98900.9%
 
190.9804610.6%
 
184.4082580.6%
 
199.4856570.6%
 
185.9546540.5%
 
193.3540.5%
 
178.9958530.5%
 
179.3824520.5%
 
184.0216500.5%
 
Other values (463)922392.0%
 
(Missing)1721.7%
 
ValueCountFrequency (%) 
115.98900.9%
 
116.36663< 0.1%
 
116.75325< 0.1%
 
117.13981< 0.1%
 
117.52642< 0.1%
 
ValueCountFrequency (%) 
300.77481031.0%
 
300.38821< 0.1%
 
300.00162< 0.1%
 
299.6151< 0.1%
 
299.22841< 0.1%
 

tg
Real number (ℝ≥0)

MISSING

Distinct488
Distinct (%)5.0%
Missing171
Missing (%)1.7%
Infinite0
Infinite (%)0.0%
Mean131.0766173
Minimum38.055
Maximum540.735
Zeros0
Zeros (%)0.0%
Memory size78.3 KiB

Quantile statistics

Minimum38.055
5-th percentile49.56
Q175.225
median106.2
Q3154.875
95-th percentile304.44
Maximum540.735
Range502.68
Interquartile range (IQR)79.65

Descriptive statistics

Standard deviation87.05881331
Coefficient of variation (CV)0.6641826369
Kurtosis6.306575944
Mean131.0766173
Median Absolute Deviation (MAD)36.285
Skewness2.250136333
Sum1291891.14
Variance7579.236975
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
38.0551101.1%
 
74.34991.0%
 
540.735950.9%
 
77.88890.9%
 
79.65880.9%
 
80.535860.9%
 
72.57860.9%
 
86.73860.9%
 
85.845850.8%
 
63.72840.8%
 
Other values (478)894889.2%
 
(Missing)1711.7%
 
ValueCountFrequency (%) 
38.0551101.1%
 
38.94180.2%
 
39.825170.2%
 
40.71210.2%
 
41.595220.2%
 
ValueCountFrequency (%) 
540.735950.9%
 
539.851< 0.1%
 
538.081< 0.1%
 
533.6551< 0.1%
 
530.1151< 0.1%
 

hdl
Real number (ℝ≥0)

MISSING

Distinct194
Distinct (%)2.0%
Missing164
Missing (%)1.6%
Infinite0
Infinite (%)0.0%
Mean50.94398668
Minimum22.4228
Maximum97.0366
Zeros0
Zeros (%)0.0%
Memory size78.3 KiB

Quantile statistics

Minimum22.4228
5-th percentile29.7682
Q140.2064
median49.0982
Q359.923
95-th percentile78.0932
Maximum97.0366
Range74.6138
Interquartile range (IQR)19.7166

Descriptive statistics

Standard deviation14.82688161
Coefficient of variation (CV)0.2910428214
Kurtosis0.3659643037
Mean50.94398668
Median Absolute Deviation (MAD)9.665
Skewness0.6475158664
Sum502460.5406
Variance219.8364184
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
48.3251301.3%
 
49.09821261.3%
 
45.61881221.2%
 
47.93841201.2%
 
51.80441151.1%
 
52.1911141.1%
 
45.23221131.1%
 
47.16521131.1%
 
44.07241131.1%
 
44.84561121.1%
 
Other values (184)868586.6%
 
(Missing)1641.6%
 
ValueCountFrequency (%) 
22.42281061.1%
 
22.809470.1%
 
23.19680.1%
 
23.582690.1%
 
23.969290.1%
 
ValueCountFrequency (%) 
97.0366981.0%
 
96.6570.1%
 
96.26342< 0.1%
 
95.87683< 0.1%
 
95.49024< 0.1%
 

ldl
Real number (ℝ≥0)

MISSING

Distinct444
Distinct (%)4.5%
Missing182
Missing (%)1.8%
Infinite0
Infinite (%)0.0%
Mean116.2473018
Minimum37.8868
Maximum210.3104
Zeros0
Zeros (%)0.0%
Memory size78.3 KiB

Quantile statistics

Minimum37.8868
5-th percentile64.5622
Q192.784
median114.047
Q3136.8564
95-th percentile177.0628
Maximum210.3104
Range172.4236
Interquartile range (IQR)44.0724

Descriptive statistics

Standard deviation33.94202347
Coefficient of variation (CV)0.2919811725
Kurtosis0.116226154
Mean116.2473018
Median Absolute Deviation (MAD)22.0362
Skewness0.3448181134
Sum1144454.686
Variance1152.060957
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
210.31041071.1%
 
37.88681051.0%
 
112.114620.6%
 
113.6604560.6%
 
121.0058560.6%
 
120.2326560.6%
 
119.0728550.5%
 
117.913550.5%
 
109.4078540.5%
 
114.8202540.5%
 
Other values (434)918591.6%
 
(Missing)1821.8%
 
ValueCountFrequency (%) 
37.88681051.0%
 
38.27341< 0.1%
 
38.662< 0.1%
 
39.04661< 0.1%
 
39.43321< 0.1%
 
ValueCountFrequency (%) 
210.31041071.1%
 
209.92383< 0.1%
 
209.15064< 0.1%
 
208.7641< 0.1%
 
208.37742< 0.1%
 

crp
Real number (ℝ≥0)

MISSING

Distinct1083
Distinct (%)11.0%
Missing163
Missing (%)1.6%
Infinite0
Infinite (%)0.0%
Mean2.528791565
Minimum0.19
Maximum37.02
Zeros0
Zeros (%)0.0%
Memory size78.3 KiB

Quantile statistics

Minimum0.19
5-th percentile0.28
Q10.55
median1.04
Q32.18
95-th percentile8.977
Maximum37.02
Range36.83
Interquartile range (IQR)1.63

Descriptive statistics

Standard deviation5.149011674
Coefficient of variation (CV)2.036155033
Kurtosis26.96956375
Mean2.528791565
Median Absolute Deviation (MAD)0.6
Skewness4.93583699
Sum24944
Variance26.51232122
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0.191111.1%
 
37.021001.0%
 
0.45930.9%
 
0.39920.9%
 
0.49860.9%
 
0.47860.9%
 
0.44850.8%
 
0.4820.8%
 
0.35800.8%
 
0.5790.8%
 
Other values (1073)897089.5%
 
(Missing)1631.6%
 
ValueCountFrequency (%) 
0.191111.1%
 
0.2240.2%
 
0.21450.4%
 
0.22370.4%
 
0.23330.3%
 
ValueCountFrequency (%) 
37.021001.0%
 
36.71< 0.1%
 
36.351< 0.1%
 
36.22< 0.1%
 
36.11< 0.1%
 

hbalc
Real number (ℝ≥0)

MISSING

Distinct51
Distinct (%)0.5%
Missing104
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean5.252302731
Minimum4.1
Maximum9.1
Zeros0
Zeros (%)0.0%
Memory size78.3 KiB

Quantile statistics

Minimum4.1
5-th percentile4.5
Q14.9
median5.1
Q35.4
95-th percentile6.4
Maximum9.1
Range5
Interquartile range (IQR)0.5

Descriptive statistics

Standard deviation0.7225145682
Coefficient of variation (CV)0.1375614859
Kurtosis10.73604742
Mean5.252302731
Median Absolute Deviation (MAD)0.3
Skewness2.74000154
Sum52118.6
Variance0.5220273012
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
59579.5%
 
5.19569.5%
 
5.29399.4%
 
4.98698.7%
 
5.38198.2%
 
4.87497.5%
 
5.46826.8%
 
5.55515.5%
 
4.75385.4%
 
5.63793.8%
 
Other values (41)248424.8%
 
ValueCountFrequency (%) 
4.11161.2%
 
4.2570.6%
 
4.3910.9%
 
4.41721.7%
 
4.52482.5%
 
ValueCountFrequency (%) 
9.11041.0%
 
960.1%
 
8.94< 0.1%
 
8.84< 0.1%
 
8.760.1%
 

ua
Real number (ℝ≥0)

MISSING

Distinct2500
Distinct (%)25.3%
Missing163
Missing (%)1.6%
Infinite0
Infinite (%)0.0%
Mean4.443844574
Minimum2.142
Maximum8.15976
Zeros0
Zeros (%)0.0%
Memory size78.3 KiB

Quantile statistics

Minimum2.142
5-th percentile2.688
Q13.56496
median4.2966
Q35.16096
95-th percentile6.7368
Maximum8.15976
Range6.01776
Interquartile range (IQR)1.596

Descriptive statistics

Standard deviation1.225451897
Coefficient of variation (CV)0.2757638968
Kurtosis0.3011811232
Mean4.443844574
Median Absolute Deviation (MAD)0.78204
Skewness0.6494059102
Sum43834.08288
Variance1.501732351
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
2.142981.0%
 
8.15976971.0%
 
4.06896170.2%
 
4.27896170.2%
 
4.41504170.2%
 
4.11264160.2%
 
4.15128160.2%
 
4.63176160.2%
 
3.63552160.2%
 
4.45536160.2%
 
Other values (2490)953895.1%
 
(Missing)1631.6%
 
ValueCountFrequency (%) 
2.142981.0%
 
2.143682< 0.1%
 
2.145361< 0.1%
 
2.15041< 0.1%
 
2.153761< 0.1%
 
ValueCountFrequency (%) 
8.15976971.0%
 
8.15641< 0.1%
 
8.153041< 0.1%
 
8.142961< 0.1%
 
8.141281< 0.1%
 

htc
Real number (ℝ≥0)

Distinct581
Distinct (%)5.8%
Missing8
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean41.4341072
Minimum26.77
Maximum57.6
Zeros0
Zeros (%)0.0%
Memory size78.3 KiB

Quantile statistics

Minimum26.77
5-th percentile30.4
Q137.8
median41.5
Q345.2
95-th percentile51.4
Maximum57.6
Range30.83
Interquartile range (IQR)7.4

Descriptive statistics

Standard deviation6.014562131
Coefficient of variation (CV)0.1451596894
Kurtosis0.1493727421
Mean41.4341072
Median Absolute Deviation (MAD)3.7
Skewness0.0449834957
Sum415128.32
Variance36.17495763
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
30.42382.4%
 
431461.5%
 
401461.5%
 
421291.3%
 
381261.3%
 
411251.2%
 
441211.2%
 
391211.2%
 
57.61121.1%
 
26.771071.1%
 
Other values (571)864886.2%
 
ValueCountFrequency (%) 
26.771071.1%
 
26.84< 0.1%
 
26.881< 0.1%
 
26.92< 0.1%
 
26.931< 0.1%
 
ValueCountFrequency (%) 
57.61121.1%
 
57.54< 0.1%
 
57.451< 0.1%
 
57.43< 0.1%
 
57.32< 0.1%
 

hgb
Real number (ℝ≥0)

MISSING

Distinct126
Distinct (%)1.3%
Missing228
Missing (%)2.3%
Infinite0
Infinite (%)0.0%
Mean14.34178998
Minimum9.3
Maximum21.8
Zeros0
Zeros (%)0.0%
Memory size78.3 KiB

Quantile statistics

Minimum9.3
5-th percentile11.3
Q113
median14.2
Q315.5
95-th percentile17.8
Maximum21.8
Range12.5
Interquartile range (IQR)2.5

Descriptive statistics

Standard deviation2.072490059
Coefficient of variation (CV)0.1445070707
Kurtosis1.446860912
Mean14.34178998
Median Absolute Deviation (MAD)1.2
Skewness0.6121117927
Sum140535.2
Variance4.295215044
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
14.22382.4%
 
13.72322.3%
 
13.62272.3%
 
13.92262.3%
 
13.42142.1%
 
14.92132.1%
 
142112.1%
 
13.12102.1%
 
14.32082.1%
 
14.52062.1%
 
Other values (116)761475.9%
 
(Missing)2282.3%
 
ValueCountFrequency (%) 
9.31021.0%
 
9.490.1%
 
9.570.1%
 
9.660.1%
 
9.790.1%
 
ValueCountFrequency (%) 
21.8961.0%
 
21.760.1%
 
21.63< 0.1%
 
21.560.1%
 
21.42< 0.1%
 

age
Real number (ℝ≥0)

Distinct531
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean59.28314387
Minimum40
Maximum85
Zeros0
Zeros (%)0.0%
Memory size78.3 KiB

Quantile statistics

Minimum40
5-th percentile45.916668
Q151.833332
median58.583332
Q365.5
95-th percentile76.5
Maximum85
Range45
Interquartile range (IQR)13.666668

Descriptive statistics

Standard deviation9.369191874
Coefficient of variation (CV)0.1580414139
Kurtosis-0.5344837308
Mean59.28314387
Median Absolute Deviation (MAD)6.833332
Skewness0.3981745467
Sum594432.0835
Variance87.78175638
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
53.916668540.5%
 
56.666668510.5%
 
59.75500.5%
 
57.583332480.5%
 
58.583332480.5%
 
56.333332480.5%
 
62.583332470.5%
 
48.166668460.5%
 
60.583332460.5%
 
48.583332460.5%
 
Other values (521)954395.2%
 
ValueCountFrequency (%) 
402< 0.1%
 
40.1666682< 0.1%
 
40.252< 0.1%
 
40.4166682< 0.1%
 
40.5833321< 0.1%
 
ValueCountFrequency (%) 
851< 0.1%
 
84.9166641< 0.1%
 
84.8333363< 0.1%
 
84.752< 0.1%
 
84.5833361< 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

df_indexIDhouseholdIDwbcmcvpltbunglucreachotghdlldlcrphbalcuahtchgbage
00101041010021010410109.672.9198.019.4389495.940.9831254.382863.72063.4024175.90300.984.63.5952057.620.248.166668
11101041020021010410205.395.4179.012.3524194.140.8814205.671257.52561.4694131.05740.344.95.2466452.017.459.916668
22101041040011010410407.588.3271.022.09989105.841.1526168.171088.50047.165298.96965.674.84.5595244.514.960.833332
33101041050021010410504.786.1208.015.6295887.480.6554219.975491.15546.0054154.25342.474.63.4473646.016.167.666664
44101041070011010410708.385.6290.013.2207295.760.9040168.9442109.74040.9796103.22220.764.85.2012843.814.679.333336
55101041080011010410804.679.9294.020.0271589.280.6667217.2692102.66049.0982148.84101.174.83.0340841.913.655.833332
66101041080021010410806.389.3228.021.98785103.860.8927209.1506108.85549.8714135.69660.454.63.7800046.916.059.583332
77101041090011010410908.791.2278.08.3189794.681.0396151.9338110.62543.685884.66540.644.55.7892843.414.645.333332
881010411000210104110011.488.2417.012.88460118.080.5085172.423653.98551.4178100.90262.714.82.3251235.212.564.666664
99101041110011010411107.987.0285.018.3185496.660.6667127.1914132.75056.057040.20640.574.44.9106449.717.357.000000

Last rows

df_indexIDhouseholdIDwbcmcvpltbunglucreachotghdlldlcrphbalcuahtchgbage
10017101273.48E+1134776330204.679.0144.018.6266586.580.7458263.661259.29595.8768156.18641.734.84.28232NaN14.449.250000
10018101283.48E+1134776330206.690.680.019.9431296.480.9153230.413686.73041.7528173.19681.245.44.8384054.320.154.166668
10019101293.48E+1134776330505.868.9213.011.70818110.340.8136177.4494151.33535.9538103.22223.805.15.0500834.311.959.583332
10020101303.48E+1134776330508.387.8197.015.65759108.000.9718178.609292.92537.8868122.93888.365.55.2315245.816.567.333336
10021101313.48E+1134776330708.588.7263.012.12833106.560.7232171.263864.60540.5930117.13985.055.94.6687244.316.648.250000
10022101323.48E+1134776330705.488.3179.012.4644594.680.9379176.676275.22535.5672127.57802.565.55.6498441.217.549.750000
10023101333.48E+1134776331004.997.2136.018.71068112.861.1865186.7278194.70027.0620116.36661.525.96.5839236.713.671.333336
10024101343.48E+1134776331004.884.4197.012.7165498.460.8927192.9134109.74047.1652120.23260.705.23.5649642.815.961.250000
10025101353.48E+1134776331205.690.7120.017.03008107.820.8362238.9188149.56538.2734158.11940.905.14.5595241.616.148.583332
10026101363.48E+1134776331207.798.3157.018.82272104.400.9153180.542254.87057.9900105.54181.775.64.9526446.617.753.833332